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DNA replication origins are necessary for the duplication of genomes. In addition, plasmid-based expression systems 
require DNA replication origins to maintain plasmids efficiently. The yeast autonomously replicating sequence (ARS) 
assay has been a valuable tool in dissecting replication origin structure and function. However, the dearth of information 
on origins in diverse yeasts limits the availability of efficient replication origin modules to only a handful of species and 
restricts our understanding of origin function and evolution. To enable rapid study of origins, we have developed a 
sequencing-based suite of methods for comprehensively mapping and characterizing ARSs within a yeast genome. Our 
approach finely maps genomic inserts capable of supporting plasmid replication and uses massively parallel deep mu- 
tational scanning to define molecular determinants of ARS function with single-nucleotide resolution. In addition to 
providing unprecedented detail into origin structure, our data have allowed us to design short, synthetic DNA sequences 
that retain maximal ARS function. These methods can be readily applied to understand and modulate ARS function in 
diverse systems. 

[Supplemental material is available for this article.] 



The application of genomic tools to classical genetic techniques 
has led to a rapid expansion of our understanding of biological 
processes at a systematic, global level. Microbial systems such as 
yeast are particularly well-suited for the merging of these methods 
due to our ability to perform experiments on a large scale. Recent 
advances have transformed the study of basic principles such as 
genetic interactions (Costanzo et al. 2011) and structural features 
of DNA elements (Sharon et al. 2012). In this study, we have de- 
veloped genomic tools for comprehensively mapping and dissect- 
ing replication origins in yeast using simple screening techniques 
that can be readily extended to other DNA elements. 

Origins of DNA replication act as sites of initiation of DNA 
replication via recruitment of the origin recognition complex 
(ORC) and other proteins necessary for the duplication of the ge- 
nome in every cell cycle (Sclafani and Holzen 2007). While origins 
on the whole are an essential part of every genome, functional 
redundancy of these noncoding elements subjects them to a 
poorly understood set of evolutionary forces. A comprehensive 
understanding of origin location and structure across multiple 
species would shed light on the interaction between DNA repli- 
cation dynamics and genome structure as well as on the revo- 
lution of origin sequences and origin-interacting proteins. 

Yeast origins promote replication and maintenance of epi- 
somal plasmids as ds-acting autonomously replicating sequences 
(ARSs) (Stinchcomb et al. 1979). This property is essential for 
plasmid-based expression systems (Parent et al. 1985; Boer et al. 
2007) and has been widely used to map origins and dissect their 
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functional domains uncovering a diversity of molecular deter- 
minants of ARS function among different yeast species. Examples 
include Saccharomyces cerevisiae, which uses a short (11-17 bp) 
T-rich ARS Consensus Sequence (ACS) motif for ARS function 
(Broach et al. 1983); Kluyveromyces lactis, whose ACS motif is 50 bp 
(Liachko et al. 2010); and fission yeast Schizosaccharomyces pombe, 
whose origins do not seem to have a consensus motif, instead 
using a more promiscuous AT-binding property of ORC (Chuang 
and Kelly 1999). Recent work also suggests a diversity of origin 
sequences among Schizosaccharomyces yeasts (Xu et al. 2012). 
In some cases, species that are relatively closely related use dif- 
ferent consensus sequences (Liachko et al. 2011; Di Rienzi et al. 
2012). 

Yeast replication origins can also be characterized through 
experiments focused on the dynamics of chromosome replica- 
tion (Raghuraman et al. 2001; Yabuki et al. 2002; Smith and 
Whitehouse 2012). While these studies are useful in detailing the 
temporal order of events during genome replication, they fall short 
of generating a complete map of potential origin sites, due to low 
resolution and variability in origin usage and efficiency in differ- 
ent cells within a population. Deletion of all known active origin 
sites on a chromosome does not completely abrogate replication 
(Dershowitz et al. 2007), suggesting the presence of cryptic origins 
whose chromosomal replication initiation signal is too weak to 
be detected in population-based assays. Thus, ARS mapping and 
functional dissection remain the most precise tools for under- 
standing the molecular determinants of yeast origin function. 
However, a lack of methods to comprehensively identify and dis- 
sect ARSs has slowed progress because origins are typically studied 
in small numbers. Thus, despite decades of study, >30% of sus- 
pected S. cerevisiae origins remain unconfirmed (Nieduszynski et al. 
2007; Siow et al. 2012). 

We have developed an approach that couples high-throughput 
ARS screening with deep sequencing to map ARSs, delineate their 



698 Genome Research 

www.genome.org 



23:698-704 © 2013, Published by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/13; www.genome.org 



ARS-seq: High-throughput analysis of ARSs 



A _ 



— u^ 3 



miniARS 
miniARSf^iiniARS 




B 



functional regions, and measure the ef- 
fect of all possible point mutations on 
individual ARS fragments using mas- 
sively parallel mutational scanning. Our 
approach yields the most comprehensive, 
high-resolution S. cerevisiae ARS data set to 
date and can be readily applied to other 
yeast strains and species. Using our data, 
we have designed and tested a 100-bp ARS 
fragment that is able to maintain epi- 
somal plasmids with much greater sta- 
bility than wild type and acts as an im- 
proved replication origin in its native 
genomic context. Such improved repli- 
cation origin modules can be useful for 
regulating genomic replication as well as 
to increase the stability of plasmids in 
diverse strains and species of yeast. 



Results 

High-throughput mapping of genomic 
ARS locations 

To obtain a complete map of ARS loca- 
tions in S. cerevisiae, we generated a >12x 
genomic library of overlapping restric- 
tion fragments cloned into a shuttle vec- 
tor that lacks ARS function but contains 
a URA3 selectable marker. Transformation- 
competent ura3 yeast were transformed 
with this library and selected for growth 
on medium lacking uracil. Colony for- 
mation requires the replication and 
propagation of the plasmid and allows 
the recovery of ARS-bearing plasmids 
(Fig. 1A). Plasmid inserts were amplified 
using vector-specific primers and iden- 
tified en masse using paired-end deep 
sequencing. 

Our screen, which we have named 
"ARS-seq/' yielded 720 distinct DNA 
fragments (median length: 702 bp) rep- 
resenting 366 unique genomic loci (227 
loci were overlapped by multiple frag- 
ments) (Supplemental Table 1). To better 

define the false-positive (FP) and false-negative (FN) rates associ- 
ated with ARS-seq, we manually cloned selected DNA fragments 
(Supplemental Analysis) and tested them for ARS activity by 
transforming yeast and monitoring colony formation on selective 
medium. The majority of our ARS-seq data (263 ARSs) overlapped 
with regions previously annotated as confirmed ARSs in the OriDB 
database (Siow et al. 2012), and a further 58 ARSs overlapped with 
regions annotated as "likely ARSs." Manual validation drastically 
increased the data set's accuracy and coverage and identified a 
further 48 unique ARS loci. Our findings suggest FP and FN rates 
of —12% for ARS-seq (Supplemental Fig. 1; Supplemental Analy- 
sis). Our data are also consistent with a recently published par- 
tially overlapping set of manually validated ARS candidates 
(Muller and Nieduszynski 2012), further underscoring the value 
of an unbiased screening tool to map genomic DNA elements 
comprehensively. 
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Figure 1. High-throughput, high-resolution mapping of ARSs using ARS-seq and miniARS-seq. 

(A) Genomic libraries were constructed in a URA3 vector lacking an ARS. Yeast were transformed with 
these libraries for selection of ARS-containing plasmids. ARS plasmids were isolated from pooled yeast 
colonies and sequenced using insert-flanking primers (ARS-seq, top row). Inserts from the ARS-seq 
plasmid pools were amplified using vector-specific primers. Randomly sheared and size-selected frag- 
ments were cloned into an ARS-less vector and rescreened for ARS function (miniARS-seq, bottom row). 

(B) A sample locus (at ARS419) comparing results of ARS-seq (red highlight), miniARS-seq (blue high- 
light), and OriDB annotation (purple highlight). The best match of the ACS motif is indicated at the top 
(red vertical line). Corresponding coordinates on chromosome 4 and annotated nearby genes are 
shown at the bottom. (C) Size distributions of ARS-seq and miniARS-seq fragments listed in Supple- 
mental Tables 1 and 2. (D) Distribution of the size differences between OriDB-annotated confirmed ARSs 
and the corresponding shortest ARS-seq/miniARS-seq fragments or inferred functional cores (see 
Methods). 



Delineating essential ARS regions using miniARS-seq 

While the average length of ARS fragments identified by ARS-seq 
was 702 bp, it is known that the regions required for ARS function 
in S. cerevisiae (as well as in several other yeasts) can be <100 bp 
(Sclafani and Holzen 2007; Liachko et al. 2010, 2011). We de- 
veloped the miniARS-seq method to identify the minimal func- 
tional regions of ARSs en masse. We PCR-amplified all ARS inserts 
from the plasmid pools isolated from the ARS-seq experiment. We 
used DNase I to randomly shear the inserts and used gel purifica- 
tion to isolate fragments in the 100-bp to 200-bp range. These 
short subfragments of ARSs were cloned into the ARS-less URA3 
vector and used to transform yeast to select for fragments that re- 
tain ARS activity (Methods). ARS plasmids were extracted from 
yeast in high numbers, and the inserts were sequenced to map 
overlapping short ARS fragments (Fig. 1A). 
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After stringent filtering to remove FPs (for details on data fil- 
tering, see the Supplemental Analysis), our experiments yielded 
a total of 12,338 unique genomic miniARS fragments (median 
length: 147 bp) representing 181 unique ARS regions (Supplemental 
Table 2). The recovery of miniARS-seq fragments correlated with 
coverage in the original ARS-seq screen (Supplemental Fig. 2), as well 
as the score of the ACS match within the ARS. We used information 
from overlapping fragments to infer functional cores of miniARS- 
seq contigs, which further delineated minimal functional ARS 
regions (median length: 92 bp) (Fig. 1B,C). Subsequent manual 
validation suggested a false-positive rate of 3.9% prior to validation 
(Supplemental Tables 3, 4; Supplemental Analysis). Combining the 
data from a single ARS-seq and miniARS-seq experiment generated 
an ARS map with much higher resolution than the combined ARS/ 
origin data set curated by OriDB (Fig. ID). While it is known that 
elements flanking the ACS are necessary for ARS function, our data 
also suggest that in most cases essential flanking elements reside 
on the 3' side of the T-rich strand of the ACS (Supplemental 
Analysis). To test the sufficiency of a 3 '-extended ACS for ARS 
function, we manually validated four unique ARS fragments, each 
<35 bp in length, that contain an ACS and a flanking Bl element 
and found three-fourths to retain ARS function, although strongly 
attenuated in one case (Supplemental Analysis). 

Deep mutational scanning of ARS1 

The contribution of specific nucleotides to ARS function can be 
elucidated by mutagenesis experiments. However, such experiments 



are limited by throughput. To test the functional consequences of 
all single substitution mutations on a given ARS in a massively 
parallel fashion, we developed mutARS-seq — a deep mutational 
scanning approach coupled with high-throughput sequencing 
(Patwardhan et al. 2009; Fowler et al. 2010; Haberle and Lenhard 
2012), which we applied to a 100-bp fragment of the well-studied 
ARS1 for method validation. This fragment contains the ACS as 
well as the Bl and B2 elements essential for ARS1 function and is 
sufficient to support plasmid replication. We used a randomly 
mutagenized oligonucleotide to generate a library of ARS1 mutant 
variants in an ARS-less vector. The resulting library contained 
>22,000 ARS1 inserts. Every position within each individual ARS1 
insert had a 2% chance of bearing a mutation. These variant 
plasmids were used to transform yeast in large numbers, and the 
resulting library was directly competed in liquid culture. The 
abundance of each variant at different times in the competition 
was measured by 101-bp paired-end deep sequencing and was used 
to calculate a relative fitness value for each mutant allele (Fig. 2A). 
The resulting library yielded data for all 300 possible single sub- 
stitutions, deletions at most positions, and combinations of these 
mutations. 

Our data showed strong depletion of plasmids with nucleo- 
tide substitutions and deletions at the ACS, Bl, and B2 domains 
of the ARS1 fragment (Fig. 2B; Supplemental Fig. 6). In addition, 
several positions showed a preference for mutations over wild- type 
nucleotides, the most striking example being a 9-bp region be- 
tween the Bl and B2 elements and two positions within the core 
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Figure 2. Comprehensive mutational scanning of a 1 00-bp fragment of ARS 7 using mutARS-seq. (A) A library of randomly mutagenized ARS 7 fragments 
was cloned into a URA3 vector. Yeast were transformed with these libraries and competed in liquid batch growth. (B) The log 2 of the enrichment ratio is 
shown for all substitution mutations within the ARS1 fragment (top: average; bottom: individual nucleotide substitutions). (Blue) Previously described ACS, 
Bl, and B2 elements. (Red box) A region of nucleotides that repress wild-type ARS1 function. The data shown are the composite of multiple samples as 
described in the Supplemental Analysis. 
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ACS (Fig. 2B). Closer inspection of the sequence preference sur- 
rounding the B2 element revealed that the optimal combination 
of nucleotides at this locus would re-create a perfect 11 -bp ACS 
match on the reverse strand, as previously predicted (Wilmes and 
Bell 2002). Analysis of variants carrying two substitutions re- 
vealed synergistic relationships between nucleotides in the ACS, 
Bl, and B2 elements, while most other substitution combinations 
showed weak epistatic (nonadditive) effects (Supplemental Anal- 
ysis). Analysis of alleles bearing deletions further demonstrated the 
functional importance of the ACS, Bl, and B2 elements. Surpris- 
ingly, there was a strong functional benefit to variants with single- 
base deletions between the ACS and the B2 element (Supple- 
mental Fig. 6). Note that these beneficial deletions also occur 
between the core ACS and the TTT of the Bl element. However, 
this does not contradict the presumably rigid distance between 
the core ACS and the Bl element since there is an "extra" T fol- 
lowing the Bl's TTT which in the case of such a deletion would join 
the last two T's of the original Bl to give the same TTT. Our findings 
show that mutARS-seq is effective at measuring effects of muta- 
tions across the ARS as well as probing for positional effects and 
spacing constraints. 
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Using mutARS-seq data to optimize ARS1 function 

The data from mutARS-seq can be used to optimize the function 
of ARS1. We constructed a synthetic DNA fragment consisting of 
nucleotides found to be the most beneficial for ARS1 function at 
each position. This 100-bp sequence (ARSlmax) (Supplemental 
Fig. 8) contained 53 mutations relative to the wild-type ARS1 se- 
quence (Supplemental Fig. 8a) and promoted faster growth in se- 
lective medium than either its wild-type equivalent or ARSlhil 
(the best-performing allele identified by mutARS-seq) (Supple- 
mental Fig. 8), indicating increased ARS activity (Fig. 3 A). DNA 
fragments bearing ARS1 and ARSlmax were cloned into CEN 
plasmids, and ARS activity was quantified using the minichromo- 
some maintenance assay, which measures plasmid loss per gen- 
eration in nonselective growth conditions (Donato et al. 2006). 
The 100-bp ARSlmax sequence had a drastically reduced plasmid 
loss rate relative to the original ARS 2, indicating a level of ARS 
efficiency comparable to a 7-kb fragment of the efficient ARS121 
(systematic name: ARS1021) (Fig. 3B; Supplemental Fig. 8b; Walker 
et al. 1990). Extending the length of the ARS fragments to 2.5 kb 
significantly increased function of the wild-type ARS1, but had 
very little effect on ARSlmax, suggesting that the 100-bp ARSlmax 
has reached maximal ARS efficiency (Fig. 3B). However, the 2.5-kb 
DNA fragment bearing ARSlmax still showed a lower plasmid loss 
rate than the 2.5-kb ARS1 fragment. We also found ARSlmax to be 
largely resistant to the effects of mcm4-chaos3, an oncogenic allele 
of MCM4 — a component of the replicative DNA helicase (Fig. 3B; 
Shima et al. 2007). Since the sequence of ARSlmax contains a re- 
verse complement ACS match at the position where the B2 ele- 
ment is located in ARS1, it is possible that ARSlmax is no longer 
dependent on the original ACS for function. We tested this hy- 
pothesis by mutating a critical TT dinucleotide in the original ACS 
(Supplemental Fig. 9). Our results show that mutating the original 
ACS abolishes the ARS function of ARS1, but does not abolish the 
ARS function of ARSlmax. This finding suggests that ARSlmax may 
be a dimeric ARS (Bolon and Bielinsky 2006). 

To assay ARSlmax function in a genomic context, we made 
a strain in which ARS1 was replaced with ARSlmax. DNA two- 
dimensional (2D) gel analysis (Brewer and Fangman 1987) showed 
an increase in origin firing at the ARS1 locus in the ARSlmax strain 
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Figure 3. Optimized synthetic ARS function. (A) ARS1 and ARSlmax 
sequences were cloned into a URA3 vector. Yeast were transformed with 
resultant plasmids and grown for 2 d on medium lacking uracil. Colony 
size is representative of ARS function. (B) Plasmid loss assays were per- 
formed on CEN vectors containing ARSs indicated: (+) ARS1; (HI) ARSlhil; 
(MAX) ARSImaxl.1; (121) ARS121. A control vector YCP1 21, bearing a 
7-kb ARS121 fragment is shown (121) as an example of very strong ARS 
function. (*) The loss rate of wild-type ARS1 in mcm4-chaos3 cells (mcm4) 
was too high to be measured. (C) Genomic DNA replication was assayed 
in strains with ARS1 and ARS 7 max integrated into the native chromosomal 
locus. The signal ratio of the top (bubble) arcs to the bottom (Y) arcs is 
indicative of replication initiation at the assayed locus. 



(Fig. 3C). Our data show that while in the wild-type ARS1 strain 
5.9%-6.2% of the selected replicating intermediate signal comes 
from the Y-arcs, in the ARSlmax strain 2.1%-2.5% of the signal 
comes from the Y-arcs (Supplemental Fig. 10). While there is no 
definitive way of converting these measurements into an actual 
number of firing events, combined, our results indicate that syn- 
thetic ARSlmax is much more efficient than the wild- type ARS1 
both in plasmid and genomic contexts. 

Discussion 

Combining straightforward genetic tools with large-scale genomic 
analysis drastically increases scientific productivity and can lead to 
discoveries inaccessible by traditional means. In yeast, the ARS 
assay is a simple approach that provides the ability to dissect with 
fine resolution DNA sequences that are sufficient to initiate DNA 
replication. While genomic loci are subject to regulation that may 
differ from plasmid-based sequences, understanding ARSs, the 
smallest functional unit of replication origins, can serve as a plat- 
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form for further understanding these sequences in their complex 
chromosomal context. 

While the first budding yeast ARS was discovered more than 
three decades ago (Stinchcomb et al. 1979), a complete map of 
ARS locations is not yet available despite laborious efforts to 
manually validate large numbers of ARSs (Donato et al. 2006; 
Nieduszynski et al. 2006; Muller and Nieduszynski 2012). Our 
methodology combines ARS screening with next-generation se- 
quencing to enable the generation of a comprehensive ARS map 
with the average resolution of <200 bp. The ARS assay is useful for 
the study of replication origins in yeast species diverged >500 
million yr, so we expect that these methods will be portable to 
a variety of strains and species. Complete ARS maps allow for 
evolutionary analysis of origin structure and genomic distribu- 
tion, while simultaneously providing data sets for the study of 
mechanistic features of origin selection. Generating such maps 
for the other sequenced yeast species will add a new dimension 
to our understanding of how replication origins affect genome 
biology. 

In addition to delineating all ARS sequences within a genome, 
understanding the functional contribution of each nucleotide 
within an ARS is an important goal. Traditional mutagenesis 
techniques require the laborious task of testing individual mutant 
alleles for phenotypic effects. These approaches are usually lim- 
ited in throughput and sensitivity, lowering the number of alleles 
that can be tested and the amount of information that can be 
gathered from each mutation. Our deep mutational scanning 
approach (mutARS-seq) allows the mapping of the effects of all 
mutations at all positions in a given ARS sequence. The synthesis 
of ARSlmax underscores the value of such data, providing enough 
knowledge of ARS1 structure to allow deliberate modulation of its 
activity. This method can be easily applied to other origins, either 
to elucidate mechanistic features of ARS function in other species 
or to understand structure-function relationships in different 
origins within the same species. In addition, approaches similar 
to ones presented here can be adapted for the mapping and dis- 
section of other DNA elements, such as promoters, genes, and 
centromeres. 

Methods 

Reagents 

All strains and primers are listed in Supplemental Table 5. Genomic 
DNA was isolated using the YeaStar Genomic DNA Kit (Zymo Re- 
search). PCR purification and purification of digested plasmids 
were performed using the DNA Clean and Concentrator-5 Kit 
(Zymo Research). All enzymes used were from New England Bio- 
labs unless otherwise noted. All primers were purchased from IDT, 
unless otherwise noted (for a complete list of oligos, see Supple- 
mental Table 5). The transformation host yeast strain used in all 
experiments was W303-1A. All yeast growth was performed at 
30°C; all bacterial growth was performed at 37°C. Bacterial trans- 
formants were selected on standard LB medium with ampicillin 
(100 |xg/mL). Yeast transformants were selected on standard syn- 
thetic complete medium lacking uracil. 

Illumina library construction 

Illumina adapter-containing primers that anneal within the vector 
sequences outside of the insert were used to amplify the relevant 
DNA, using the high-fidelity enzyme Phusion HF (15 cycles of 
amplification). Gel extraction was used to isolate fragments of 



appropriate length for each experiment, followed by purification 
with the DNA Clean and Concentrator-5 kit (Zymo Research). ARS- 
seq and miniARS-seq experiments were sequenced using primers 
OCA275 and OCA276, or IL575 and OCA276. 

Vector construction 

The vectors used for this study were derivatives of subcloning 
vector pRS406. An MscI site or an Nrul site were inserted into the 
BamHI site of pRS406 to create pIL19 and pIL22, respectively. Full 
vector sequences are available upon request. 

Construction and screening of ARS-seq libraries 

Genomic DNA from a p zero derivative of FY4 (prototroph, S288C 
strain background) was purified using the YeaStar Genomic DNA 
Kit. Genomic DNA was digested to completion with one of four 
four-cutter restriction enzymes — Mbol, Alul, Haelll, or Rsal. To 
prevent insert concatamerization, fragmented DNA was treated 
with Antarctic Phosphatase and purified with the DNA Clean and 
Concentrator-5 Kit prior to ligation. DNA digested with Mbol was 
ligated into the BamHI site of pRS406. DNA digested with Alul, 
Haelll, and Rsal was ligated into the MscI site of pIL19 and the Nrul 
site of pIL22 (each insert DNA pool was cloned into both vectors 
separately). To maximize cloning efficiency, each ligation reaction 
was purified using DNA Clean and Concentrator -5 columns and 
digested with the vector cloning restriction enzyme (BamHI, MscI, 
and Nrul for pRS406, pIL19, and pIL22, respectively). 

Ligation products were used to transform Alpha-Select Gold 
Efficiency competent Escherichia coli cells (Bioline). Cloning effi- 
ciency and insert sizes were checked using colony PCR on random 
E. coli colonies. Plasmid DNA was purified using the Wizard Plus SV 
Miniprep Kit (Promega). Total library coverage was ~12x genome 
size (~3x for each restriction enzyme pool). Library transfor- 
mations of yeast were conducted using a standard lithium acetate 
method and plated onto complete synthetic agar medium lacking 
uracil. The host strain for transformations was W303-1A. Yeast 
colonies were grown for 3 d at 30°C. To enrich for ARS plasmids 
and to eliminate nontransformed cells, yeast colonies were replica- 
plated onto fresh -Ura plates and grown for two more days at 30°C. 
Plasmids were extracted from pooled yeast colonies by glass bead 
disruption, followed by DNA purification using Wizard Plus SV 
Miniprep Kit columns. To remove genomic yeast DNA and to fa- 
cilitate the purification of individual plasmids for further testing, 
the extracted total DNA was used to transform E. coli, resultant 
ampicillin-resistant colonies were scraped and pooled, and plas- 
mids were extracted. Forty-eight ARS clones were individually 
purified, Sanger-sequenced (primers IL429 and IL430), and used to 
retransform yeast to confirm function. 

Construction and screening of miniARS-seq libraries 

ARS inserts were amplified from purified ARS-seq plasmid pools 
using primers IL594 and OCA272. DNA was sheared using DNase I 
(Roche) and resolved on 2% agarose gels. Fragments corresponding 
to 100-200 bp in size were cut out of the gel and purified using the 
GenElute minus EtBr (Sigma-Aldrich) and the DNA Clean and 
Concentrate-5 kits. Ends of sheared DNA fragments were made 
blunt using the Klenow fragment of DNA polymerase I, dephos- 
phorylated with Antarctic Phosphatase, and purified with the 
DNA Clean and Concentrator-5 Kit. Insert DNA was ligated into 
vectors pIL19 and pIL22 separately, as above. Library coverage 
was calculated using colony PCR. Approximately 100,000 clones 
bearing inserts were screened in total. Yeast transformations and 
plasmid extractions were performed as above. Plasmids extracted 
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from the first round of miniARS-seq were used to transform yeast 
again to remove false positives that passed through the first 
screen. ARS clones were individually purified, Sanger-sequenced 
(primers IL429 and IL430), and used to retransform yeast to con- 
firm function. 

Construction and screening of mutARS-seq libraries 

Oligo mutARSl_trilink was synthesized by Trilink Biotechnol- 
ogies. The central 100 bp was randomly mutagenized at a fre- 
quency of 2% at each position. The outer 12 bp at each end of the 
oligo was not mutagenized and contained homology for amplifi- 
cation primers and an MscI restriction site for cloning. Primers 
IL585 and IL586 were used to amplify the oligo mutARSl_trilink. 
The resulting fragment was digested with MscI, phosphatase- 
treated, and ligated into the Nrul site of pIL22 as above. Colony 
PCR on sets of random colonies was used to estimate library cov- 
erage; 22,000-24,000 insert-bearing colonies were pooled together 
for use in the screen. Sanger sequencing of a subset of clones was 
used for quality control. 

The plasmid library was used to transform yeast as above. 
Yeast were grown for 3 d at 30°C, at which point plates were 
scraped and inoculated into a 1-L culture of medium lacking uracil. 
Samples were taken after 12 and 20 h. Total DNA was purified and 
used as template for primers IL594 and one of the barcoded primers 
IL576, IL589, IL590, and IL593. Fragments of appropriate length 
were gel-purified and sequenced using primers IL591 and IL592. 

Manual validation of ARS function 

DNA segments to be tested were amplified by PCR with primers 
containing appropriate restriction sites (BamHI, MscI, or Nrul). 
The resulting fragments were cloned into either pRS406, pIL19, or 
pIL22 (depending on restriction sites present within the insert), 
Sanger-sequenced, and used to transform yeast. Growth on syn- 
thetic complete medium lacking uracil was assayed after 3 d at 30°C. 

Construction and characterization of ARSlmax plasmids 
and strains 

Synthetic ARSlmax sequences were constructed by PCR fusing 
overlapping primers ARSlmax_F and ARSlmax_R. (The ARSlmax 
mutant noted in Supplemental Fig. 9 was cloned using primers 
ARSlmaxGG_F and ARSlmax_R.) The resulting insert was digested 
with MscI and cloned into pIL22 and pIL07 (Liachko et al. 
2010) — a centromeric vector used for plasmid loss assays. Plasmid 
loss assays were performed as described (Donato et al. 2006). Long 
ARSlmax fragments were constructed using standard PCR fusion 
techniques (the mutant ARS1 noted in Supplemental Fig. 9 was 
constructed using primers IL721 and \U 22) and either cloned into 
appropriate vectors or integrated, replacing the ARS1 sequence 
using a standard pop-in/pop-out method in the FY3 strain. The 
integrated ARSlmax strain was backcrossed to FY3 (resulting 
in strain ILY506). Genomic 2D gel experiments were performed 
as described (Brewer and Fangman 1987). Analysis of replication 
intermediates on 2D gels was performed using the Quantity 
One v.4.6.9 software on the Personal Molecular Imager (Bio-Rad) 
(Supplemental Fig. 10). 

ARS-seq data analysis 

For more detailed descriptions of computational data analysis, 
see the Supplemental Analysis. ARS-seq sequencing reads were 
mapped using Bowtie version 0.12.7 to the October 2003 version 
of the yeast genome (Saccerl) to correspond with the coordinate 



system used by OriDB. Custom scripts for filtering and other 
analyses were written in Python. All ACS positions and scores were 
determined by the GIMSAN and SADMAMA motif analysis tools 
(Keich et al. 2008; Ng and Keich 2008). 

Assigning the mapped read pairs to fragments generated by 
the four-cutter restriction enzymes used to fragment the insert 
DNA yielded 926 unique contiguous fragments. Quality score fil- 
tering and removing fragments that had a combined read count of 
1 left 720 fragments (Fig. 1C; Supplemental Table 1), which as- 
sembled into 366 contigs. To improve the resolution of ARS-seq, 
we inferred functional cores of ARSs using a dynamic program- 
ming approach that requires each core to be at least 50 bp long and 
keeps track of the resulting read depths of all core segments. ARS 
candidates selected for manual validation are described in the 
Supplemental Analysis. 

miniARS-seq data analysis 

For more detailed descriptions of computational data analysis, see 
the Supplemental Analysis. Mapping of the reads was done using 
Bowtie and the Saccerl version of the S. cerevisiae genome as above. 
The above filtering resulted in 12,338 unique miniARS fragments 
that were assembled into 181 unique contigs (the average contig 
consisted of 68 overlapping fragments) (Supplemental Table 2). 
Inferred functional core segments were assigned in a slightly dif- 
ferent way than for ARS-seq, by defining the endpoints as the 0.05 
quantile position of endpoints from all fragments within a contig 
on both sides closest to the center of the contig subject to the 
constraint that the resulting contig is at least 50 bp long. Manual 
validation targets are discussed in the Supplemental Analysis. 

mutARS-seq data analysis 

More detailed descriptions of computational data analysis can be 
found in the Supplemental Analysis. Reads were mapped using 
Bowtie2 (Langmead and Salzberg 2012). The reference sequence 
used was S288C background ARS416 (chr4, 462510-462609). Cus- 
tom Python scripts were used to combine overlapping reads into 
a single variant sequence taking into account quality score in- 
formation at each position. The enrichment value for each variant 
was calculated by dividing the variant counts against the counts of 
the wild-type allele and taking the base 2 logarithm. 

Data access 

The sequencing data from this study have been deposited in the 
NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra) 
under accession numbers SRA051407 (ARS-seq), SRA051408 
(miniARS-seq), and SRA051406 (mutARS-seq). 
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